This work addresses multi-class segmentation of indoor scenes with RGB-Dinputs. While this area of research has gained much attention recently, mostworks still rely on hand-crafted features. In contrast, we apply a multiscaleconvolutional network to learn features directly from the images and the depthinformation. We obtain state-of-the-art on the NYU-v2 depth dataset with anaccuracy of 64.5%. We illustrate the labeling of indoor scenes in videossequences that could be processed in real-time using appropriate hardware suchas an FPGA.
展开▼